NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning to Compute the Articulatory Representations of Speech with the MIRRORNET

https://doi.org/10.21437/Interspeech.2023-562

Siriwardena, Yashish M; Espy-Wilson, Carol; Shamma, Shihab (August 2023, Interspeech 2023)

Full Text Available
Speaker-independent Speech Inversion for Estimation of Nasalance

https://doi.org/10.21437/Interspeech.2023-2352

Siriwardena, Yashish M; Espy-Wilson, Carol; Boyce, Suzanne; Tiede, Mark; Oren, Liran (August 2023, Interspeech 2023)

Full Text Available
The Mirrornet : Learning Audio Synthesizer Controls Inspired by Sensorimotor Interaction

https://doi.org/10.1109/ICASSP43922.2022.9747358

Siriwardena, Yashish M.; Marion, Guilhem; Shamma, Shihab (May 2022, International Conference on Acoustics, Speech and Signal Processing)

Experiments to understand the sensorimotor neural interactions in the human cortical speech system support the existence of a bidirectional flow of interactions between the auditory and motor regions. Their key function is to enable the brain to ‘learn’ how to control the vocal tract for speech production. This idea is the impetus for the recently proposed "MirrorNet", a constrained autoencoder architecture. In this paper, the MirrorNet is applied to learn, in an unsupervised manner, the controls of a specific audio synthesizer (DIVA) to produce melodies only from their auditory spectrograms. The results demonstrate how the MirrorNet discovers the synthesizer parameters to generate the melodies that closely resemble the original and those of unseen melodies, and even determine the best set parameters to approximate renditions of complex piano melodies generated by a different synthesizer. This generalizability of the MirrorNet illustrates its potential to discover from sensory data the controls of arbitrary motor-plants.
more » « less
Full Text Available
Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia Using Convolutional Neural Networks

https://doi.org/10.1145/3462244.3479967

Siriwardena, Yashish M.; Espy-Wilson, Carol; Kitchen, Chris; Kelly, Deanna L. (October 2021, Proceedings of the 2021 International Conference on Multimodal Interaction (ICMI ’21))

This study investigates the speech articulatory coordination in schizophrenia subjects exhibiting strong positive symptoms (e.g. hallucinations and delusions), using two distinct channel-delay correlation methods. We show that the schizophrenic subjects with strong positive symptoms and who are markedly ill pose complex articulatory coordination pattern in facial and speech gestures than what is observed in healthy subjects. This distinction in speech coordination pattern is used to train a multimodal convolutional neural network (CNN) which uses video and audio data during speech to distinguish schizophrenic patients with strong positive symptoms from healthy subjects. We also show that the vocal tract variables (TVs) which correspond to place of articulation and glottal source outperform the Mel-frequency Cepstral Coefficients (MFCCs) when fused with Facial Action Units (FAUs) in the proposed multimodal network. For the clinical dataset we collected, our best performing multimodal network improves the mean F1 score for detecting schizophrenia by around 18% with respect to the full vocal tract coordination (FVTC) baseline method implemented with fusing FAUs and MFCCs.
more » « less
Full Text Available

Search for: All records